Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance

نویسندگان

  • Michael E. Saks
  • Seshadhri Comandur
چکیده

Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array of length n provides a (1 + δ)-multiplicative approximation to the distance to monotonicity (n minus the length of the LIS), and uses only O((log n)/δ) space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. The improved approximation factor reflects a qualitative difference between our algorithm and previous algorithms: previous polylogarithmic space algorithms could not reliably detect increasing subsequences of length as large as n/2, while ours can detect increasing subsequences of length βn for any β > 0. More precisely, our algorithm can be used to estimate the length of the LIS to within an additive δn for any δ > 0 while previous algorithms could only achieve additive error n(1/2− o(1)). Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also show how our technique can be applied to other problems solvable by dynamic programs. For example, we give a streaming algorithm for approximating LCS(x, y), the length of the longest common subsequence between strings x and y, each of length n. Our algorithm works in the asymmetric setting (inspired by [AKO10]), in which we have random access to y and streaming access to x, and runs in small space provided that no single symbol appears very often in y. More precisely, it gives an additive-δn approximation to LCS(x, y) (and hence also to E(x, y) = n − LCS(x, y), the edit distance between x and y when insertions and deletions, but not substitutions, are allowed), with space complexity O(k(log n)/δ), where k is the maximum number of times any one symbol appears in y. We also provide a deterministic 1-pass streaming algorithm that outputs a (1 + δ)-multiplicative approximation This work was supported in part by NSF under CCF 0832787. This work was supported by the Early Career LDRD program at Sandia National Laboratories. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. for E(x, y) (which is also an additive δn-approximation), in the asymmetric setting, and uses O( √ (n log n)/δ) space. All these algorithms are obtained by carefully trading space and accuracy within a standard dynamic program.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A polylogarithmic space deterministic streaming algorithm for approximating distance to monotonicity

The distance to monotonicity of a sequence of n numbers is the minimum number of entries whose deletion leaves an increasing sequence. We give the first deterministic streaming algorithm that approximates the distance to monotonicity within a 1 + ε factor for any fixed ε > 0 and runs in space polylogarithmic in the length of the sequence and the range of the numbers. The best previous determini...

متن کامل

Approximating the Longest Increasing Sequence and Distance from Sortedness in a Data Stream

We revisit the well-studied problem of estimating the sortedness of a data stream. We study the complementary problems of estimating the edit distance from sortedness (Ulam distance) and estimating the length of the longest increasing sequence (LIS). We present the first sub-linear space algorithms for these problems in the data stream model. • We give a O(log n) space, one-pass randomized algo...

متن کامل

Edit Distance to Monotonicity in Sliding Windows

Given a stream of items each associated with a numerical value, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing with respect to the numerical value. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on networ...

متن کامل

Online Pattern Matching for String Edit Distance with Moves

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies betwee...

متن کامل

Streaming Algorithms For Computing Edit Distance Without Exploiting Suffix Trees

The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter k n. We present two strea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013